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Abstract 

We analyze the general version of the classic guessing game Mastermind with n positions 
and k colors. Since the case k < e > constant, is well understood, we concentrate 

on larger numbers of colors. For the most prominent case k — n, our results imply that 
Codebreaker can find the secret code with 0(n log log n) guesses. This bound is valid also 
when only black answer-pegs are used. It improves the 0(n log n) bound first proven by 
Chvatal (Combinatorica 3 (1983), 325-329). We also show that if both black and white 
answer-pegs are used, then the 0{n log log n) bound holds for up to n 2 log log n colors. These 
bounds are almost tight as the known lower bound of fl(n) shows. Unlike for k < n 1_e , 
simply guessing at random until the secret code is determined is not sufficient. In fact, we 
show that any non-adaptive strategy needs an expected number of Q(nlogn) guesses. 

Keywords: Mastermind; query complexity; combinatorial games; randomized algorithms. 



1 Introduction 



Mastermind (see Section [l~lj for the rules) and other guessing games like liar games [Pdu2p^e9i 



have attracted the attention of computer scientists not only because of their playful nature, but 
more importantly because of their relation to fundamental complexity and information-theoretic 
questions. In fact, Mastermind with two colors was first analyzed by Erdos and Renyi |ER63 
in 1963, several years before the release of Mastermind as a commercial boardgame. 

Since then, intensive research by various scientific communities produced a plethora of results 



on various aspects of the Mastermind game (see also the literature review in Section 1.4). 



In a famous 1983 paper, Chvatal |Chv83 determined, precisely up to constant factors, the 



asymptotic number of queries needed on a board of size n for all numbers k of colors with 
k < n 1_£ , e > constant. Interestingly, a very simple guessing strategy suffices, namely asking 
random guesses until the answers uniquely determine the secret code. 

Surprisingly, for larger numbers of colors, no sharp bounds exist. In particular for the 
natural case of n positions and k = n colors, Chvatal's bounds 0{n log n) and $7(n) from 1983 
are still the best known asymptotic results. 

We almost close this gap open for roughly 30 years and prove that Codebreaker can solve the 
k = n game using only O(nloglogn) guesses. This bound, as Chvatal's, even holds for black- 
pegs only Mastermind. When also white answer-pegs are used, we obtain a similar improvement 
from the previous-best 0(n log n) bound to O(nloglogn) for all n < k < n 2 log log n. 

1.1 Mastermind 



Mastermind is a two-player board game invented in the seventies by the Israeli telecommuni- 
cation expert Mordechai Meirowitz. The first player, called Codemaker here, privately chooses 





Figure 1: A typical round of Mastermind 



a color combination of four pegs. Each peg can be chosen from a set of six colors. The goal of 
the second player, Codebreaker, is to identify this secret code. To do so, he guesses arbitrary 
length-4 color combinations. For each such guess he receives information of how close his guess 
is to Codemaker's secret code. Codebreaker's aim is to use as few guesses as possible. 

Besides the original 4-position 6-color Mastermind game, various versions with other num- 
bers of positions or colors are commercially available. The scientific community, naturally, often 



regards a generalized version with n positions and k colors (according to Chvatal Chv83 , this 
was first suggested by Pierre Duchet). For a precise description of this game, let us denote by 
[k] the set {1, . . . , k} of positive integers not exceeding k. At the start of the game, Codemaker 
chooses a secret code z G [k] n . In each round, Codebreaker guesses a string x G [k] n . Codemaker 
replies with the numbers eq(z,x) := \{i £ [n] \ Z{ = Xi}\ of positions in which her and Code- 
breaker's string coincide, and with ir(z, x), the number of additional pegs having the right color, 
but being in the wrong position. Formally, ir(z,x) := max pe s n \{i £ [n] \ Z{ = x p ^}\ — eq(z, x), 
where S n denotes the set of all permutations of the set [n]. In the original game, eq(z,x) is 
indicated by black answer-pegs, and tt(z, x) is indicated by white answer-pegs. Based on this 
and all previous answers, Codebreaker may choose his next guess. He "wins" the game if his 
guess equals Codemaker's secret code. 

We should note that often, and partially also in this work, a black-pegs only variant is 
studied, in which Codemaker reveals eq(z,x) but not tt(z,x). This is justified both by several 



applications (see Section 1.4) and by the insight that, in particular for small numbers of colors, 



the white answer-pegs do not significantly improve Codebreaker's situation (see Section |3j). 
1.2 Previous Results 

Mastermind has been studied intensively in the mathematics and computer science literature. 
For the original 4-position 6-color version, Knuth [Knu77 has given a deterministic strategy 



that wins the game in at most five guesses. He also showed that no deterministic strategy has 
a 4-round guarantee. 

The generalized n-position /c-color version was investigated by Chvatal |Chv83||. He first 
noted that a simple information-theoretic argument (attributed to Pierre Duchet) provides a 
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lower bound of f2(nlogfc/ logn) for any k = k(n). 

Extending the result |ER63| of Erdos and Renyi from k = 2 to larger numbers of colors, he 
then showed that for any fixed e > 0, n sufficiently large and k < n 1_e , repeatedly asking random 
guesses until all but the secret code are excluded by the answers is an optimal Codebreaker 
strategy (up to constant factors). More specifically, using the probabilistic method and random 
guesses, he showed the existence of a deterministic non-adaptive strategy for Codebreaker, that 
is, a set of (2 + e)n \^^p^ guesses such that the answers uniquely determine any secret code 
Codemaker might have chosen (here and in the remainder, logn denotes the binary logarithm 
of n). These bounds hold even in the black-pegs only version of the game. 

For larger values of k, the situation is less understood. Note that the information-theoretic 
lower bound is fl(n) for any number k = n a , a > a constant, of colors. For k between n and n 2 , 
Chvatal presented a deterministic adaptive strategy using 2n log k + 4n guesses. For k = n, this 
strategy does not need white answer-pegs. Chvatal's result has been improved subsequently. 



Chen, Cunha, and Homer CCH96 showed that for any k > n, 2n[~logn] + 2n + \k/n\ +2 
guesses suffice. Goodrich Goo09b proved an upper bound of n\\og k~\ + [(2 — l/k)n \ +k for the 
number of guesses needed to win the Mastermind game with an arbitrary number k of colors 
and black answer-pegs only. Note that for the case of k = n colors and positions, all these 
results give the same asymptotic bound of O(nlogn). 

1.3 Our Contribution 

The results above show that Mastermind is well understood for k < n 1_e , where we know 
the correct number of queries apart from constant factors. In addition, a simple non-adaptive 
guessing strategy suffices to find the secret code, namely casting random guesses until the code 
is determined by the answers. 

On the other hand, for k = n and larger, the situation is less clear. The best known upper 
bound, which is 0(n) (and tight) for k = n a , < a < 1 a constant, suddenly increases to 
O(nlogn) for k = n, while the information-theoretic lower bound remains at f2(n). 

In this work, we prove that indeed there is a change of behavior around k = n. We show 
that the random guessing strategy, and, in fact, any other non-adaptive strategy, cannot find 
the secret code with an expected number of less than ©(nlogn) guesses. This can be proven 



quite easily via an entropy compression argument as used by Moser Mos09 . 

The main contribution of our work is a (necessarily adaptive) strategy that for k = n finds 
the secret code with only 0(n log logn) queries. This reduces the 0(logn) gap to 0(loglogn). 
Like the previous strategies for k < n, our new one does not use white answer-pegs. Our 
strategy also improves the current best bounds for other values of k in the vicinity of re; see 
Theorem [T] below for the precise result. 

The central part of our guessing strategy is setting up suitable coin-weighing problems, 
solving them, and using the solution to rule out the possible occurrence of some colors at some 



positions. By a result of Grebinski and Kucherov GK00 , these coin weighing problems can be 
solved by relatively few independent random weighings. 

While our strategy thus is guided by probabilistic considerations, it can be derandomized 
to obtain a deterministic O(reloglogre) strategy for black-peg Mastermind with k = n col- 



ors. Moreover, appealing to an algorithmic result of Bshouty |Bsh09 instead of Grebinski and 



Kucherov's result, we obtain a strategy that can be realized as a deterministic polynomial-time 
codebreaking algorithm. 

We also improve the current-best bounds for Mastermind with black and white answer- 
pegs, which stand at 0(n log re) for re < k < n 2 . For these k, we prove that 0(n log logn) 
guesses suffice. We point out that this improvement is not an immediate consequence of our 
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0(n log log n) bound for k = n black-peg Mastermind. Reducing the number of colors from k to 
n is a non-trivial sub-problem as well. For example, when k > n 1+e , Chvatal's strategy for the 
game with black and white answer-pegs also uses G(nlogn) guesses to reduce the number of 
colors from k to n, before employing a black-peg strategy to finally determine the secret code. 



1.4 Related Work 



A number of results exist on the computational complexity of evaluating given guesses and 
answers. Stuckman and Zhang showed that it is NP-hard to decide whether there exists a secret 



code consistent with a given sequence of queries and black- and white-peg answers SZ06 . This 



result was extended to black-peg Mastermind by Goodrich Goo09b| . More recently, Viglietta 
showed that both hardness results apply also to the setting with only k = 2 colors |Vigl2 . In 



addition, he proved that counting the number of consistent secret codes is #P-complete. 

Another intensively studied question in the literature concerns the computation of (explicit) 
optimal winning strategies for small values of n and k. As described above, the foundation for 
these works was laid by Knuth's famous paper Knu77 for the case with n = 4 positions and 
k = 6 colors. His strategy is worst-case optimal. Koyama and Lai KL93 studied the average- 
case difficulty of Mastermind. They gave a strategy that solves Mastermind in an expected 
number of about 4.34 guesses if the secret string is sampled uniformly at random from all 6 4 
possibilities, and they showed that this strategy is optimal. Today, a number of worst-case 
and average-case optimal winning strategies for different (small) values of n and k are known — 
both for the black- and white-peg version of the game [God04| JP09 and for the black-peg 
version |JP11| . Non-adaptive strategies for specific values of n and k were studied in God03 



In the field of computational intelligence, Mastermind is used as a benchmark problem. For 
several heuristics, among them genetic and evolutionary algorithms, it has been studied how 



well they play Mastermind IKC03 , TK03 BGL09 GCG1 1 GMC1 1 



Trying to understand the intrinsic difficulty of a problem for such heuristics, Droste, Jansen, 
and Wegener |DJW06| suggested to use a query complexity variant (called black-box complex- 
ity). For the so-called onemax test-function class, an easy benchmark problem in the field of 
evolutionary computation, the black-box complexity problem is just the Mastermind problem for 
two colors. This inspired, among others, the result |DW12| showing that a memory-restricted 
version of Mastermind (using only two rows of the board) can still be solved in 0(n/ log n) 
guesses when the number of colors is constant. 



Several privacy problems have been modeled via the Mastermind game. Goodrich Goo09a 



used black-peg Mastermind to study the extent of private genomic data leaked by comparing 
DNA-sequences (even when using protocols only revealing the degree of similarity). Focardi 
and Luccio | FL10| showed that certain API-level attacks on user PIN data can be seen as an 



extended Mastermind game. 



1.5 Organization of this paper 

We describe and analyze our 0(n log log n) strategy for k = n colors in Section [2] In Section [3] 
we present a strong connection between the black-pegs only and the classic (black and white 
pegs) version of Mastermind. This yields, in particular, the claimed bound of 0(n log log n) for 
the classic version with n < k < n 2 log log n colors. In Section [4] we present our lower bound 
result for non-adaptive strategies. 



4 



2 The 0(n log log n) Adaptive Strategy 



In this section we present the main contribution of this work, a black-pegs only strategy that 
solves Mastermind with k = n colors in 0(n log log n) queries. We state our results for an 
arbitrary number k = k{n) of colors; they improve upon the previously known bounds for all 
k = o(nlogn) with k > n 1_e for every fixed e > 0. 

Theorem 1. For Mastermind with n positions and k = k(n) colors, the following holds. 

• If k = Q(n) then there exists a randomized winning strategy that uses black pegs only and 
needs an expected number of 0(nloglogn + k) guesses. 

• If k = o(n) then there exists a randomized winning strategy that uses black pegs only and 
needs an expected number of O ^ralog ( i g°n/fc) ) ) 9 uesses - 

The O-notation in Theorem [l] only hides absolute constants. Note that, setting k =: n 1_<5 , 
5 = S(n), the bound for k = o(n) translates to 0(nlog(5~ 1 )). 

We describe our strategy and prove Theorem [T] in Sections 2.1||2.3 We discuss the deran- 



domization of our strategy in Section 2.4 



2.1 Main Ideas 

Our goal in this section is to give an informal sketch of our main ideas, and to outline how the 
0(n log log n) bound for k = n arises. For the sake of clarity, we nevertheless present our ideas 
in the general setting — it will be useful to distinguish between k and n notationally. As justified 



in Section 2.2.1 below, we assume that k < n and that both k and n are powers of two. 

A simple but crucial observation is that when we query a string x G [k] n and the answer 
eq(z, x) is (recall that z denotes Codemaker's secret color code), then we know that all queried 
colors are wrong for their respective positions; i.e., we have Z{ ^ X{ for all i G [n]. To make 
use of this observation, we maintain, for each position i, a set Cj C [k] of colors that we still 
consider possible at position i. Throughout our strategy we reduce these sets successively, and 
once |Cj| = 1 for all i G [n] we have identified the secret code z. Variants of this idea have been 



used by several previous authors Chv83, Goo09b|. 



Our strategy proceeds in phases. In each phase we reduce the size of all sets Ci by a factor 
of two. Thus, before the jth phase we will have \d\ < k/2 J for all i G [n\. Consider now 
the beginning of the j'th phase, and assume that all sets C, have size exactly k' := k/2P . 
Imagine we query a random string r sampled uniformly from C\ x • • • x C n . The expected 
value of eq(z,r) is n/k', and the probability that eq(z,r) = is (1 — l/k') n < e~ n / k ' . If k' is 
significantly smaller than n, this probability is very small, and we will not see enough 0-answers 
to exploit the simple observation we made above. However, if we group the n positions into 
m := An/k 1 blocks of equal size A//4, the expected contribution of each such block is 1/4, and 
the probability that a fixed such block contributes to eq(z,r) is (1 — l/k') k 'l A ~ e -1 / 4 , i.e., 
constant. We will refer to blocks that contribute to eq(z,r) as 0-blocks in the following. For 
a random query we expect a constant fraction of all m blocks to be 0-blocks. If we can identify 
which blocks these are, we can rule out a color at each position of each such block and make 
progress towards our goal. 

As it turns out, the identification of the 0-blocks can be reduced to a coin-weighing problem 



that has been studied by several authors; see GK00,Bsh09 and references therein. Specifically, 



we are given m coins of unknown integer weights and a spring scale. We can use the spring 
scale to determine the total weight of an arbitrary subset of coins in one weighing. Our goal is 
to identify the weight of every coin with as few weighings as possible. 
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In our setup, the 'coins' are the blocks we introduced above, and the 'weight' of each block 
is its contribution to eq(z, r). To simulate weighings of subsets of coins by Mastermind queries, 
we use 'dummy colors' for some positions, i.e., colors that we already know to be wrong at these 
positions. Using these, we can simulate the weighing of a subset of coins (=blocks) by copying 
the entries of the random query r in blocks that correspond to coins we wish to include in our 
subset, and by using dummy colors for the entries of all other blocks. 

Note that the total weight of our 'coins' is eq(z,r). Typically this value will be close to its 
expectation n/k', and therefore of the same order of magnitude as the number of blocks m. It 



follows from a coin- weighing result by Grebinski and Kucherov GKOO that 0(mj log m) random 
queries (of the described block form, simulating the weighing of a random subset of coins) 
suffice to determine the contribution of each block to eq(z, r) with some positive probability. 
As observed before, typically a constant fraction of all blocks contribute to eq(z,r), and 
therefore we may exclude a color at a constant fraction of all n positions at this point. 

Repeating this procedure of querying a random string r and using additional 'random coin- 
weighing queries' to identify the 0-blocks eventually reduces the sizes of the sets C% below k'/2, 
at which point the phase ends. In total this requires Q(k') rounds in which everything works out 
as sketched, corresponding to a total number of G(A/ ■ (to/ log m)) = G(n/ log(4n/fc')) queries 
for the entire phase. 

Summing over all phases, this suggests that for k = n a total number of 

)log n 
= 0(n)y = 0(n log log n) 

queries suffice to determine the secret code z, as claimed in Theorem [T] for k = n. 

We remark that our precise strategy, Algorithm [TJ slightly deviates from this description. 
This is due to a technical issue with our argument once the number k' of remaining colors drops 
below CTogn for some C > 0. Specifically, beyond this point the error bound we derive for a 
fixed position is not strong enough to beat a union bound over all n positions. To avoid this 
issue, we stop our color reduction scheme before k' becomes that small (for simplicity as soon 
as k' is less than y/n), and solve the remaining Mastermind problem by asking random queries 
from the remaining set C\ x • • • x C n , as originally proposed by Erdos and Renyi |ER63| and 




Chvatal Chv83 



2.2 Precise Description of Codebreaker's Strategy 
2.2.1 Assumptions on n and k, Dummy Colors 

Let us now give a precise description of our strategy. We begin by determining a dummy 
color for each position, i.e., a color that we know to be wrong at that particular position. For 
this we simply query the n + 1 many strings (1, 1, . . . , 1), (2, 1, 1, . . . , 1), . . . , (2, 2, . . . , 2) £ [k] n . 
Processing the answers to these queries in order, it is not hard to determine the location of all 
l's and 2's in Codemaker's secret string z. In particular, this provides us with a dummy color 
for each position. 

Next we argue that for the main part of our argument we may assume that n and k are 
powers of two. To see this for n, note that we can simply extend Codemaker's secret string in 
an arbitrary way such that its length is the smallest power of two larger than n, and pretend 
we are trying to determine this extended string. To get the answers to our queries in this 
extended setting, we just need to add the contribution of the self-made extension part (which 
we determine ourselves) to the answers Codemaker provides for the original string. As the 
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extension changes n at most by a factor of two, our claimed asymptotic bounds are unaffected 
by this. 

To argue that we may also assume A; to be a power of two, we make use of the dummy colors 
we already determined for the original value of k. Similar to the previous argument, we increase 
k to the next power of two and consider the game with this larger number of colors. To get 
the answers to our queries in this extended setting from Codemaker (who still is in the original 
setting), it suffices to replace every occurrence of a color that is not in the original color set 
with the dummy color at the respective position. 

We may and will also assume that k < n. If k > n we can trivially reduce the number 
of colors to n by making k monochromatic queries. With this observation the first part of 
Theorem [T] follows immediately from the O(nloglogn) bound we prove for the case k = n. 

2.2.2 Eliminating Colors via Coin- Weighing Queries 

With these technicalities out of the way, we can focus on the main part of our strategy. As 
sketched above, our strategy operates in phases, where in the j'th phase we reduce the sizes of 
the sets Cj from k/2 3 ~^ to k/2 3 . For technical reasons, we do not allow the sizes of Cj to drop 
below k/2 3 during phase j; i.e., once we have |Cj| = k/2 3 for some position i G [n], we no longer 
remove colors from Cj at that position and ignore any information that would allow us to do 
so. 

Each phase is divided into a large number of rounds, where a round consists of querying a 
random string r and subsequently identifying the 0-blocks (blocks that contribute to eq(z,r)) 
by the coin-weighing argument outlined above. 

To simplify the analysis, the random string r is sampled from the same distribution 
throughout the entire phase. Specifically, at the beginning of phase j we define the set 
TZj := C\ x • • • x C n , and sample the random string r uniformly at random from TZj in each 
round of phase j. Note that we do not adjust TZj during phase j; information about excluded 
colors we gain during phase j will only be used in the definition of the set TZj+\ in phase j + 1. 

We now introduce the formal setup for the coin- weighing argument. As before we let 
kl := k/2 J ~ 1 and partition the n positions into m := in/k 1 blocks of size fe'/4. More for- 
mally, for every s £ [m] we let B s := {(s — l)k' /A + 1, . . . , sk' /4} denote the indices of block s, 
and we denote by v s := \{i E B s : zi = r$}| the contribution of block B s to eq(z,r). (Note that 
J2se[m] v s = e q(-2 ; ) r )-) As indicated above we wish to identify the 0-blocks, that is, the indices 
s G [m] for which v s = 0. 

For y G {0, l} m , define r y as the query that is identical to r on the blocks B s for which y s = 1, 
and identical to the string of dummy colors on all other blocks. Thus eq(z, r y ) = X^e[ m ] y s =i v $- 
With this observation, identifying the values v s from a set of queries of form r y is equivalent to 
a coin-weighing problem in which we have m coins with positive integer weights that sum up 
to eq(z,r): Querying r y in the Mastermind game provides exactly the information we obtain 
from weighing the set of coins indicated by y. 

We will only bother with the coin- weighing if the initial random query of the round satisfies 
eq(z,r) < m/2. (Recall that the expected value of eq(z,r) is m/4.) If this is the case, we 
query an appropriate number f{m) of strings of form r y , with y £ {0, l} m sampled uniformly 
at random (u.a.r.) and independently. The function f(m) is implicit in the proof of the coin- 
WClff hing result of [GK00| ; it is in 0(m/logm) and guarantees that the coin- weighing succeeds 



with probability at least 1/2. Thus with probability at least 1/2, these queries determine all 
values v s and, in particular, identify all 0-blocks. Note that the inequality eq(z, r) < m/2 also 
guarantees that at least half of the m blocks are 0-blocks. 

We say that a round is successful if eq(z,r) < m/2 and if the coin- weighing successfully 
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identifies all O-blocks. In each successful round, we update the sets Cj as outlined above; i.e., 
for each position % that is in a 0-block and for which \d\ > k'/2 we set C% := C% \ {r^}. Note 
that it might happen that rj is a color that was already removed from C{ in an earlier round 
of the current phase, in which case Ci remains unchanged. If a round is unsuccessful we do 
nothing and continue with the next round. 

This completes the description of our strategy for a given phase. We abandon this color 
reduction scheme once k' is less than y/n. At this point, we simply ask queries sampled uniformly 
and independently at random from the current set TZ = C\ x • • • xC n . We do so until the answers 



uniquely determine the secret code z. It follows from Chvatal's result Chv83 that the expected 
number of queries needed for this is 0(n log k' / log {n/k')) = 0(n). 

This concludes the description of our strategy. It is summarized in Algorithm [l] Correctness 
is immediate from our discussion, and it remains to bound the expected number of queries the 
strategy makes. 



Algorithm 1: Playing Mastermind with many colors 



1 Determine a dummy color for each position; 

2 foreach i 6 [n] do d <— [k]; 
a j 4— and k! k; 

4 while k! > \/n do 



5 
6 
7 
8 
9 
10 

11 
12 
13 

14 



j <r- j + 1, k' 4- k/2 j ~ 1 , TZj 4- C\ x • • • x C n , and to 4- An/k'; 
repeat 

Select a string r u.a.r. from TZj and query eq(z,r); 
if eq(z,r) < m/2 then 

for z = l,..., /(m) /* f(m) = G(m/ log m) /* do 
Sample y u.a.r. from {0, l} m and query eq(z,r y ); 

if these f(m) queries determine the O-blocks of r then 
foreach i 6 [n] do 

if i is in a 0-block and |Cj| > k' /2 then d i— Cj \ {r^}; 



until V* G [n] : \d\ = k'/2; 

15 TZ <— C\ x • • • x C n ; 

16 Select strings r independently and u.a.r. from TZ and query eq(z, r) until z is determined; 



2.3 Proof of Theorem [T] 

We begin by bounding the expected number of rounds in the jth phase. 

Claim 2. The expected number of rounds required to complete phase j is 0(k') = 0{k/2 3 ). 

Proof. We first show that a round is successful with probability at least 1/4. Recall that eq(z, r) 
has an expected value of n/k' = m/4. Thus, by Markov's inequality, we have eq(z,r) < m/2 
with probability at least 1/2. Moreover, as already mentioned, the proof of the coin- weighing 



result by Grebinski and Kucherov GKOO] implies that our /(to) = G(m/logm) random coin- 



weighing queries identify all O-blocks with probability at least 1/2. Thus, in total the probability 
for a successful round is at least 1/2-1/2 = 1/4. 

We continue by showing that the probability that a successful round decreases the number 
of available colors for a fixed position, say position 1, is at least 1/4. Note that this happens if 
r G TZj satisfies the following two conditions: 
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(i) v\ = 0, i.e., block B\ is a 0-block with respect to r, and 

(ii) r\ G Ci, i.e., the color rj has not been excluded from Cj in a previous round of phase j. 

For (i) recall that in a successful round at least m/2 of the m blocks are 0-blocks. It follows 
by symmetry that B\ is a 0-block with probability at least 1/2. Moreover, conditional on (i), 
T\ is sampled uniformly at random from the k' — 1 colors that are different from z\ and were 
in C\ at the beginning of the round. Thus the probability that r\ is in the current set G\ 
is \C\\j(k! — 1), which is at least 1/2 because we do not allow \C\\ to drop below k'/2. We 
conclude that, conditional on a successful round, the random query r decreases \C\ \ by one with 
probability at least 1/2-1/2 = 1/4. 

Thus, in total, the probability that a round decreases |Ci| by one is at least 1/4-1/4 = 1/16 
throughout our strategy. It follows that the probability that after t successful rounds in phase 
j we still have \C\\ > k' /2 is bounded by the probability that in t independent Bernoulli trials 
with success probability 1/16 we observe fewer than A//2 successes. If i/16 > k' , by Chernoff 
bounds this probability is bounded by e~ ct for some absolute constant c > 0. 

Let us now denote the number of rounds phase j takes by the random variable T. By a 
union bound, the probability that T > t, i.e., that after t steps at one of the positions i G [n] 
we still have |Cj| > k' /2, is bounded by ne~ ct for t > 16k' . It follows that 

E[T] = ^ Pr [ T > *] < + n e ~ Ct = 1Qk ' + ne- n(k "> = O(k'), (1) 

t>l t>16fc' 

where the last step is due to k' > y/n = w(logn). □ 

With Claim [2] in hand, we can bound the total number of queries required throughout our 
strategy by a straightforward calculation. 

Proof of Theorem^ for each phase j we have m = @(n/k') = 0(n/(fe/2- ?_1 )) and that f(m) = 
©(m/logm). Thus by Claim [2j the expected number of queries our strategy asks in phase j is 
bounded by 

O(k') • (1 + f{m)) = O ( - ™ . ) = O ( -. , " :) . 

V log( wV \log("A)+J/ 

It follows that throughout the main part of our strategy we ask an expected number of 
queries of at most 

°^E log{n / k)+J = 0(n(loglogn - loglog(nA))) = O (nlog (^^)) ■ 

(This calculation is for k < n; as observed before, for k = n a very similar calculation yields a 
bound of 0(n log log n).) As the number of queries for determining the dummy colors and for 
wrapping up at the end is only 0(n), Theorem [T] follows. □ 

2.4 Derandomization 

The strategy presented in the previous section can be derandomized and implemented as a 
polynomial-time algorithm. 

Theorem 3. The bounds stated in Theorem [7] can be achieved by a deterministic winning 
strategy. Furthermore, this winning strategy can be realized in polynomial time. 
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Proof. The main loop of the algorithm described above uses randomization in two places: for 
generating the random string r of each round (line [7] in Algorithm 1), and for generating the 
/(m) many random coin- weighing queries r y used to identify the 0-blocks of r if eq(z, r) < m/2 
(line [10]). 

The derandomization of the coin- weighing algorithm is already given in the work of Grebinski 



and Kucherov GKOO . They showed that a set of f'(m) = Q(m/ \ogm) random coin-weighing 
queries y 1 , . . . , y}'( m ) , sampled from {0, l} m independently and uniformly at random, has, with 
some positive probability, the property that it distinguishes any two distinct coin-weighing in- 
stances in the following sense: For any two distinct vectors v , w with non-negative integer entries 
such that Y^sekn] Vs — an< ^ SseH Ws — m /^> there exists an index j £ [f'(m)] for which 
J2se[m] yi=i Vs ^ S se [ m ] yi=i w s- It follows by the probabilistic method that, deterministically, 
there is a set D C {0, l} m of size at most /'(m) such that the answers to the corresponding coin- 
weighing queries identify every possible coin-weighing instance. Hence we can replace the f{m) 
random coin- weighing queries of each round by the f'{m) coin- weighing queries corresponding 
to the fixed set D. 

It remains to derandomize the choice of r in each round. As before we consider m := An/k' 
blocks of size A//4, where k' is the size of the sets Ci at the beginning of a phase. To make 
sure that a constant fraction of all queries in a phase satisfy eq(z,r) < m/2 (compare line [8] of 
Algorithm [TJ , we ask a set of k! queries such that, for each position i G [n], every color in Ci is 
used at position i in exactly one of these queries. (If all sets Ci are equal, this can be achieved 
by simply asking k' monochromatic queries.) The sum of all returned scores must be exactly 
n, and therefore we cannot get a score of more than m/2 = 2n/k' for more than k' /2 queries. 
In this way we ensure that for at least k — k' /2 = k' /2 queries we get a score of at most m/2. 

As in the randomized version of our strategy, in each of these k! /2 queries at least half of the 
blocks must be 0-blocks. We can identify those by the derandomized coin-weighing discussed 
above. Consider now a fixed block. As it has size k' /4, it can be a non-0-block in at most k' /A 
queries. Thus it is a 0-block in at least k' /2 — k 1 /4 = k! /4 of the queries. 

To summarize, we have shown that by asking k! queries of the above form we get at least k' / 2 
queries of score at most m/2. For each of them we identify the 0-blocks by coin-weighing queries. 
This allows us to exclude at least k' /4 colors at each position. I.e., as in the randomized version of 
our strategy we can reduce the number of colors by a constant factor using only 0(k'-m/ log m) = 
0(n/ log (4n/ k)) queries. By similar calculations as before, the same asymptotic bounds follow. 

We abandon the color reduction scheme when k! is a constant. At this point, we can solve 
the remaining problem in time 0(n) by repeatedly using the argument we used to determine 



the dummy colors in Section 2.2.1 



Note that all of the above can easily be implemented in polynomial time if we can solve the 
coin-weighing subproblems in polynomial time. An algorithm for doing the latter is given in the 



work of Bshouty |Bsh09 . Using this algorithm as a building block, we obtain a deterministic 
polynomial-time strategy for Codebreaker that achieves the bounds stated in Theorem [TJ 

□ 



3 Mastermind with Black and White Answer-Pegs 

In this section, we analyze the Mastermind game in the classic version with both black and 
white answer-pegs. Interestingly, there is a strong general connection between the two versions. 
Roughly speaking, we can use a strategy for the k = n black-peg game to learn which colors 
actually occur in the secret code of a black/ white-peg game with n positions and n 2 colors. 
Having thus reduced the number of relevant colors to at most n, Codebreaker can again use a 
k = n black-peg strategy (ignoring the white answer-pegs) to finally determine the secret code. 
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More precisely, for all k, n £ N let us denote by b(n, k) the minimum (taken over all strate- 
gies) maximum (taken over all secret codes) expected number of queries needed to find the 
secret code in a black-peg Mastermind game with k colors and n positions. Similarly, denote 
by bw(n,k) the corresponding number for the game with black and white answer-pegs. Then 
we show the following. 

Theorem 4. For all k,n £ N with k > n, 

bw(n, k) = Q(k/n + b(n, n)). 

Combining this with Theorem [TJ we obtain a bound of 0(n log log n) for black/white Mas- 
termind with n < k < n 2 log log n colors, improving all previous bounds in that range. 

For the case k < n it is not hard to see that bw(n,k) = 0(b(n,k)), see Corollary [6] below. 
Together with Theorem [4j this shows that to understand black/white-peg Mastermind for all n 
and k, it suffices to understand black-peg Mastermind for all n and k. 

Before proving Theorem |4j let us derive a few simple preliminary results on the relation of 
the two versions of the game. 

Lemma 5. For all n, k, 

bw(n, k) > b(n, k) — k + 1. 

Proof. We show that we can simulate a strategy in the black/white Mastermind game by one 
receiving only black-pegs answers and using k — 1 more guesses. Fix a strategy for black/white 
Mastermind. Our black-peg strategy first asks k — 1 monochromatic queries. This tells us how 
often each of the k color arises in the secret code. From now on, we can play the strategy for 
the black/white game. While we only receive black answer-pegs, we can compute the number of 
white pegs we would have gotten in the black/white game from the just obtained information 
on how often each color occurs in the code. With this information available, we can indeed play 
as in the given strategy for black/white Mastermind. □ 

Lemma [5] will be used to prove that the b(n, n) term in the statement of Theorem [4] cannot 
be avoided. As a corollary, it yields that white answer-pegs are not extremely helpful when 
k = 0(n). 

Corollary 6. For all k < n, 

bw(n, k) = @(b(n, k)). 

Proof. Obviously, bw(n,k) < b(n,k) for all n,k. If k = o(n), then the information theoretic 
lower bound b(n, k) = Q(re log kj log n) is of larger order than k, hence the lemma above shows 
the claim. For k = Q(n), note first that both b(n,k) and bw(n,k) are in Q(n) due to the 
information theoretic argument. If b(n, k) = 0(n), there is nothing to show. If b(n, k) = uj(n), 
we again invoke Lemma [5} □ 

In the remainder of this section, we prove Theorem [4} To describe the upper bound, let 
us fix the following notation. Let C be the set of all available colors and k = \C\. Denote by 
z € C n the secret code chosen by Codemaker. Denote by C* := {zi \ i £ [re]} the (unknown) 
set of colors in z. 

Codebreaker's strategy leading to the bound of Theorem [4] consists of roughly these three 
steps. 

(1) Codebreaker first asks roughly k/n guesses containing all colors. Only colors in a guess 
receiving a positive answer can be part of the secret code, so this reduces the number of colors 
to be regarded to at most n 2 . Also, Codebreaker can learn from the answers the cardinality n' 
of C* , that is, the number of distinct colors in the secret code. 
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(2) By asking an expected number of 0(n') (dependent) random queries, Codebreaker learns 
n' disjoint sets of colors of size at most n such that each color of C* is contained in exactly one 
of these sets. Denote by k' the cardinality of a largest of these sets. 

(3) Given such a family of sets, Codebreaker can learn C* with an expected number of 
b(n' , k') queries by simulating an optimal black-peg Mastermind strategy. Once C* is known, 
an expected number of b(n, n') queries determine the secret code, using an optimal black-peg 
strategy for n' colors. 

Each of these steps is made precise in the following. Before doing so, we remark that after 
a single query Codebreaker may detect |C* H X\ for any set X of at most n colors via a single 
Mastermind query to be answered by black and white answer-pegs. 

Lemma 7. For an arbitrary set X of at most n colors, let col(A) := |C* Pi X\, the number 
of colors of X occurring in the secret code. After a single initial query, Codebreaker can learn 
col(X) for any X via a single Mastermind query to be answered by black and white pegs. 

Proof. As the single initial query, Codebreaker may ask (1, . . . , 1), the code consisting of color 1 
only. Denote by b the number of black pegs received (there cannot be a white answer-peg). 
This is the number of occurrences of color 1 in the secret code. 

Let X C C, v := \X\ < n. To learn col(X), Codebreaker extends X to a multiset of n 
colors by adding the color 1 exactly n — v times and guesses a code arbitrarily composed of this 
multiset of colors. Let y be the total number of (black and white) answer-pegs received. Then 
col(X) = y — min{n — v, b}, if 1 ^ X or b = 0, and col(X) = y — min{n — v, b — 1} otherwise. □ 

To ease the language, we shall call a query determining col(X) a color query. We now show 
that using roughly k/n color queries, Codebreaker can learn the number |C*| of different colors 
occurring in the secret code and exclude all but n\C*\ colors. 

Lemma 8. With \k/n\ color queries, Codebreaker can learn both \C*\ and a superset Co of C* 
consisting of at most n\C*\ colors. 

Proof. Let Xi, . . . ,Xr k / n i be a partition of C into sets of cardinality at most n. By asking 

the corresponding \k/n] color queries, Codebreaker immediately learns \C*\ := Yll=i^ c °l(Aj). 
Also, Co := [j{Xi | col(Aj) > 0} is the desired superset. □ 

Lemma 9. Assume that Codebreaker knows the number n' = \C*\ of different colors in z as 
well as a set Co 5 C* of colors such that \Cq\ < n\C*\. 

Then with an expected number of Q(n') color queries, Codebreaker can find a family 
Ci, . . . ,C n / of disjoint subsets of Co, each of size at most |~|Co|/n'] < n, such that C* C 
Ci U . . . U C n > and \C* n C| = 1 for all i G [n'j. 

Proof. Roughly speaking, Codebreaker's strategy is to ask color queries having an expected 
answer of one. With constant probability, such a query contains exactly one color from C*. 
Below is a precise formulation of this strategy. 

For the analysis, note first that the value of k' during the application of the above strat- 
egy does not increase. In particular, all sets Ci defined and queried have cardinality at most 
|~|Co|/n'] < n. It is also clear that the above strategy constructs a sequence of disjoint Cj and 
that for each color occurring in z there is exactly one Ci containing this color. 

It remains to prove the estimate on the expected number of queries. To this aim, we first 
note that throughout a run of this strategy, n' is the number of colors of C* left in Cq. Hence 
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Algorithm 2: Codebreaker's stratej 



l while n' > do 

riCol/n'l; 

Let C n / be a random subset of C with |C n / 
Ask the color query C n > ; 
if col(C n /) = 1 then 
Co Co \ C n /; 
n' <— n' — 1; 



the event "col(C n /) = 1" occurs with probability 

n'k\\C \-n r )...(\C \-n' -k' + 2) (|C | - n') . . . (\C \ - n' - k' + 2) 
|Co|...(|C |-A/ + l) (|C |-l)...(|C |-fc' + l) 



|c |-fe' + i y V |c |-fc' + i 
> ( i - 



n — 1 



|C |-(|C |/n'), 

k'-i 



> , . _ |Co|/(*/-l) 



> 1 



|C |-(|C |/nO, 

^ \ fe'- i 

(fc'- 1)(1- 1/n') 



which is bounded from below by a constant (the later estimates assume n' > 2; for n' = 1 the 
second term of the sequence of inequalities already is one) . 

Consequently, with constant probability the randomly chosen C n > satisfies "col(C n ') = 1". 
Hence after an expected constant number of iterations of the while-loop, such a C n > will be 
found. Since each such success reduces the value of n' by one, a total expected number of 
0(1(7*1) iterations suffices to find the desired family of sets (Cj)j e [ n /j. □ 

Given a family of sets as just constructed, Codebreaker can simulate a black-peg strategy 
to determine C*. 

Lemma 10. Let C±, . . . , C n i be a family of disjoint subsets of C such that C* C C\ U . . . U C n i 
and \C* nCj| = 1 for all i G [n'\. Assume that k! := max{|Cj| | i £ [n 1 ]} < n. Then Codebreaker 
can detect C* using an expected number ofb(n',k') color queries. 

Proof. Let z' G C\ x . . . x C n < be the unique such string consisting of colors in C* only. Note 
that in black-peg Mastermind, the particular sets of colors used at each position are irrelevant. 
Hence there is a strategy for Codebreaker to detect z 1 using an expected number of b(n', k!) 
guesses from C± x . . . x C n > and receiving black-peg answers only. 

We now show that for each such query, there is a corresponding color query in the (n, k) 
black/white Mastermind game giving the same answer. Hence we may simulate the black-peg 
game searching for z' by such color queries. Since z' contains all colors of C* and no other 
colors, once found, it reveals the set of colors occurring in the original secret code z. 

Let y' G C\ x . . . x C n > be a query in the black-peg Mastermind game searching for z'. For 
each position i £ [n 1 ], we have z[ = if and only if y[ £ Ci is the unique color from Cj that is 
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in C* . As moreover the sets (Cj) ie r n /i are disjoint, we have eq(z',y') = col({y[, . . . ,y' n ,}), and 
we can obtain this value (i.e., the black-peg answer for the guess y' relative to z') by a color 
query relative to z. □ 

Note that if our only goal is to find out C* , then for k <C n 2 we can be more efficient 
by asking more color queries in Lemma [8| leading to a smaller set Co, to smaller sets Cj in 



Lemma ^1 and thus to a smaller k' value in Lemma 10. Since this will not affect the asymptotic 



bound for the total numbers of queries used in the black/white-peg game, we omit the details. 

Proof of Theorem^ The upper bound follows easily from applying Lemmas [7|to|10[ which show 
that Codebreaker can detect the set C* of colors arising in the secret code z with an expected 
number of 1 + \k/n\ + O(n) + b(n,n) guesses. Since \C*\ < n, he can now use a strategy for 
black-peg Mastermind and determine z with another expected number of b(n, n) guesses. Note 
that b(n,n) = Q(n), so this proves the upper bound. 

We argue that this upper bound is optimal apart from constant factors. Assume first that 
the secret code is a random monochromatic string (Codemaker may even announce this). Fix 
a (possibly randomized) strategy for Codebreaker. With probability at least 1/2, this strategy 
does not use the particular color in any of the first k/{2n) guesses. It then also did not guess 
the correct code. Hence the expected number of queries necessary to find the code is at least 
k/(4n). 

We finally show that for k > n, also the b(n, n) term cannot be avoided. By the information 
theoretic argument, there is nothing to show if b(n,n) = 0(n). Hence assume b(n,n) = u>(n). 
We will show bw(n, k) + n + 1 > bw(n, n). The claim then follows from bw(n, n) = @(b(n, n)) 
(Corollary [6]). 

We show that we can solve the k = n color Mastermind game by asking n + 1 preliminary 
queries and then simulating a strategy for black/white Mastermind with n positions and k > n. 



As in Section 2.2.1, we use n + 1 queries to learn for each position whether it has color 1 or 
not. We then simulate a given strategy for k > n colors as follows. In a &;-color query, replace 
all colors greater than n by color 1. Since we know the positions of the pegs in color 1, we 
can reduce the answers by the contribution of these additional 1-pegs in the query. This gives 
the answer we would have gotten in reply to the original query (since the secret code does not 
contain colors higher than n). Consequently, we can now simulate the fc-color strategy in an 
n-color Mastermind game. □ 



4 A Tight Lower Bound for Non-Adaptive Strategies 

When analyzing the performance of non-adaptive strategies, it is not very meaningful to ask 
for the number of queries needed until the secret code is queried for the first time. Instead we 
ask for the number of queries needed to identify it. 



In their work on the 2-color black-peg version of Mastermind, Erdos and Renyi |ER63 



showed that random guessing needs, with high probability, (2 + o(l))n/ logn queries to identify 
the secret code, and that this is in fact best possible among non-adaptive winning strategies. The 



upper bound was derandomized by Lindstrdm Lin64,Lin65 and, independently, by Cantor and 



Mills CM66|. That is, for 2-color black-pegs Mastermind a deterministic non-adaptive winning 



strategy using (2 + o(l))n/logn guesses exists, and no non-adaptive strategy can do better. 
For adaptive strategies, only a weaker lower bound of (1 + o{l))n/\ogn is known. This 



bound results from the information-theoretic argument mentioned in Section 1.2. It remains a 
major open problem whether there exists an adaptive strategy that achieves this bound. In fact, 
it is not even known whether adaptive strategies can outperform the random guessing strategy 
by any constant factor. 
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Here in this section we prove that for Mastermind with k = 0(n) colors, adaptive strategies 
are indeed more powerful than non-adaptive ones, and outperform them even in order of mag- 
nitude. More precisely, we show that any non-adaptive strategy needs Q(nlogn) guesses. Since 
we know from Section [2] that adaptively we can achieve a bound of O(nloglogn), this separates 
the performance of non- adaptive strategies from that of adaptive ones. Our result answers a 
question left open in [God03|. 



The fi(nlogn) bound for non-adaptive strategies is tight; as we will show in Lemma 12 
below, it can be achieved by simple random guessing. 

For the formal statement of the bound, we use the following notation. A deterministic non- 
adaptive strategy is a fixed ordering X } X * • • • « X of all possible guesses, i.e., the elements of 
[k] n . A randomized non- adaptive strategy is a probability distribution over such orderings. For 
a given secret code z E [k] n , we ask for the smallest index j such that the queries x 1 , . . . ,x 3 
together with their answers eq(z, x 1 ), . . . , eq(z, x 3 ) uniquely determine z. Mastermind with 



non-adaptive strategies is also referred to as static Mastermind God03 



Theorem 11. For any (randomized or deterministic) non-adaptive strategy for black-peg Mas- 
termind with n positions and k colors, the expected number of queries needed to determine a 
secret code z sampled uniformly at random from [k] n is O ^ m ax{log(n/fc) 1} ) ■ 

Theorem [TT] shows, in particular, that for any non-adaptive strategy there exists a secret 
code z G [k] n which can only be identified after Q (nlogk/ max{log(n/A;), 1}) queries. For k > n, 
this is an improvement of O(logn) over the information-theoretic lower bound mentioned in the 



introduction. For the case k = 0(rt) Theorem 11 gives a lower bound of f2(ralogn) guesses for 



every non-adaptive strategy, showing that adaptive strategies are indeed more powerful than 
non-adaptive ones in this regime (recall Theorem [I]). 



To give an intuition for the correctness of Theorem 11, note that for a uniformly chosen 
secret code z G [k) n , for any single fixed guess x of a non-adaptive strategy the answer eq(z,x) 
is binomially distributed with parameters n and 1/k. That is, eq(z,x) will typically be within 
the interval n/k±0(^Jn/k). Hence, we can typically encode the answer using log(0(^/n/k)) = 
0(log(n/k)) bits. Or, stated differently, our 'information gain' is usually 0(log(n/fc)) bits. 
Since the secret code 'holds nlogk bits of information', we would expect that we have to make 
f2(n log kj \og(n/k)) guesses. 

To turn this intuition into a formal proof, we recall the notion of entropy: For a discrete 
random variable Z over a domain D, the entropy of Z is defined by H(Z) := — Y2zeD P r [^ = 
z] log(Pr[Z = z\). Intuitively speaking, the entropy measures the amount of information that 
the random variable Z carries. If Z for example corresponds to a random coin toss with 
Pr ['heads'] = Pr ['tails'] = 1/2, then Z carries 1 bit of information. However, a biased coin toss 
with Pr ['heads'] = 2/3 carries less (roughly 0.918 bits of) information since we know that the 
outcome of heads is more likely. Furthermore, for random variables Z, Y over domains Dz, Dy 
the conditional entropy of Z conditional on Y is defined by H(Z\Y) := — ^2 zeDz yt =D Y ^M-^ = 
z A Y = y] log(Pr[Z = z\Y = y]). Intuitively, the conditional entropy measures the additional 
information that Z holds if we already know the outcome of Y. For example, if Z is completely 
determined by the outcome of Y, we have H{Z\Y) = 0. On the other hand, if Z is independent 
of Y, we have H{Z\Y) = H{Z). In our proof we use the following properties of the entropy, 
which can easily be seen to hold for any two random variables Z, Y over domains Dz , Dy ■ 

(El) If Z is uniformly distributed then H{Z) = log(|£>z|), 

(E2) H((Z,Y))<H(Z) + H(Y), 

(E3) H((Z, Y)) = H(Y) + H(Z\Y) = H{Z) + H{Y\Z), and 
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(E4) if Z is determined by the outcome of Y, i.e., Z = f(Y) for a deterministic function /, 
then we have H(Z\Y) = and thus by (E3) also H(Y) = H(Z) + H(Y\Z) > H(Z). 

Property (El) is a characterization of uniform distributions. That is, H(Z) = log(|D^|) if and 
only if Z is the uniform distribution over Dz- The inequality in (E2) holds with equality if and 
only if the two variables Z and Y are independent. 



Proof of Theorem 1 1 , Since every randomized Mastermind strategy is a finite convex combina- 
tion of deterministic ones, it suffices to prove Theorem 11 for deterministic strategies. 

Let S = (x 1 , x 2 , . . . ) denote a deterministic strategy of Codebreaker. We first show a lower 
bound on the number of guesses that are needed to identify at least half of all possible secret 
codes. For j = 1, . . . , k n , let Aj = Aj(S) C [k] n denote the set of codes that can be uniquely 
determined from the answers to the queries x , . . . ,x 3 . Let s be the smallest index for which 
\A S \ > k n /2. 

Consider a code Z £ [k] n sampled uniformly at random, and set Y^ := eq(Z,x l ), 1 < i < s. 
Moreover, let 

~ = (Z if Z€A S , 
| 'fail' ii Z A s . 

By our definitions, the sequence Y := (Y\, Y2, . . . , Y s ) determines Z, and hence by (E4) we have 

H(Z) < H(Y). (2) 

We derive a lower bound on H(Z). Let 1a s denote the indicator random variable for the event 
that Z E A a . Since 1a s is determined completely by Z we have by (E4) that 

H(Z) = H(l As ) + H(Z\l As ) 
>H(Z\l As ) 

>- Pr t^ = z^ z log(Pr[Z = z\Z G A s ]) 



1 



k n \\A S 
' iJ log(|A|) 



k r ' 

> ^log(^k n )=n(nlogk). (3) 

We now derive an upper bound on H(Y). For every i, Y% is binomially distributed with param- 
eters n and 1/k. Therefore, its entropy is (see, e.g., jJS99|) 

HQQ = \ log (2vre^(l - ^\ + \ + oQ = 0(max{log(n/A;), 1}). 



We thus obtain 



(E2) * 

H(Y) < Y,H(Y i ) = sH(Y 1 ) = sO(max{log(n/k),l}). (4) 
i=i 



Combining and Q, we obtain 

nlog k 



max{log(n/A;), 1} 
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Since, by definition of s, at least half of all secret codes in [k] n can only be identified by the 
strategy S after at least s guesses, it follows that the expected number of queries needed to 
identify a uniformly chosen secret code is at least s/2. □ 



Chvatal Chv83j proved that the bound given in Theorem 11 is tight if < n 1 £ , e > a 



constant. Here we show the same for k = 0(n). We do so by analyzing the random guessing 
strategy. As in the case case k < n 1_e , random guessing turns out to be optimal up to constant 
factors. Whereas the case considered by Chvatal requires substantial work, the bound for 
k = 0(n) follows from a routine coupon collector argument. 

Lemma 12. For black-peg Mastermind with n positions and k = O(n) colors, the random 
guessing strategy needs an expected number of O(nlogn) queries to determine an arbitrary 
fixed code z £ [k] n . Furthermore, for a large enough constant C, Cn log n queries suffice with 
probability 1 — o(l). 

Proof. We can easily eliminate colors whenever we receive a 0-answer. For every position 
i £ [n] we need to eliminate k — 1 potential colors. This can be seen as having n parallel coupon 
collectors, each of which needs to collect k — 1 coupons. 

The probability that for a random guess we get an answer of is (1 — l/k) n , i.e., constant. 
Conditional on a 0-answer, the color excluded at each position is sampled uniformly from all 
k — 1 colors that are wrong at that particular position. Thus the probability that at least one 
of the k — 1 wrong colors at one fixed position is not eliminated by the first t many 0-answers 
is bounded by (k - 1)(1 - < ke' 1 ^ . 

Let now T denote the random variable that counts the number of 0-answers needed to 
determine the secret code. By a union bound over all n positions, we have Pr[T > t] < 
nke~ l l k = 0(n 2 ) • e~®( t / n \ It follows by routine calculations that E[T] = 0(n log n) and 
Pr[T > Cralogn] = o(l) for C large enough. As a random query returns a value of with 
constant probability, the same bounds also hold for the total number of queries needed. □ 
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